12 research outputs found

    Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning

    Full text link
    Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework

    Multi-unit Bilateral Trade

    Get PDF
    We characterise the set of dominant strategy incentive compatible (DSIC), strongly budget balanced (SBB), and ex-post individually rational (IR) mechanisms for the multi-unit bilateral trade setting. In such a setting there is a single buyer and a single seller who holds a finite number k of identical items. The mechanism has to decide how many units of the item are transferred from the seller to the buyer and how much money is transferred from the buyer to the seller. We consider two classes of valuation functions for the buyer and seller: Valuations that are increasing in the number of units in possession, and the more specific class of valuations that are increasing and submodular. Furthermore, we present some approximation results about the performance of certain such mechanisms, in terms of social welfare: For increasing submodular valuation functions, we show the existence of a deterministic 2-approximation mechanism and a randomised e/(1-e) approximation mechanism, matching the best known bounds for the single-item setting

    Grounding or Guesswork? Large Language Models are Presumptive Grounders

    Full text link
    Effective conversation requires common ground: a shared understanding between the participants. Common ground, however, does not emerge spontaneously in conversation. Speakers and listeners work together to both identify and construct a shared basis while avoiding misunderstanding. To accomplish grounding, humans rely on a range of dialogue acts, like clarification (What do you mean?) and acknowledgment (I understand.). In domains like teaching and emotional support, carefully constructing grounding prevents misunderstanding. However, it is unclear whether large language models (LLMs) leverage these dialogue acts in constructing common ground. To this end, we curate a set of grounding acts and propose corresponding metrics that quantify attempted grounding. We study whether LLMs use these grounding acts, simulating them taking turns from several dialogue datasets, and comparing the results to humans. We find that current LLMs are presumptive grounders, biased towards assuming common ground without using grounding acts. To understand the roots of this behavior, we examine the role of instruction tuning and reinforcement learning with human feedback (RLHF), finding that RLHF leads to less grounding. Altogether, our work highlights the need for more research investigating grounding in human-AI interaction.Comment: 16 pages, 2 figure

    Learning Stackelberg Equilibria and Applications to Economic Design Games

    Full text link
    We study the use of reinforcement learning to learn the optimal leader's strategy in Stackelberg games. Learning a leader's strategy has an innate stationarity problem -- when optimizing the leader's strategy, the followers' strategies might shift. To circumvent this problem, we model the followers via no-regret dynamics to converge to a Bayesian Coarse-Correlated Equilibrium (B-CCE) of the game induced by the leader. We then embed the followers' no-regret dynamics in the leader's learning environment, which allows us to formulate our learning problem as a standard POMDP. We prove that the optimal policy of this POMDP achieves the same utility as the optimal leader's strategy in our Stackelberg game. We solve this POMDP using actor-critic methods, where the critic is given access to the joint information of all the agents. Finally, we show that our methods are able to learn optimal leader strategies in a variety of settings of increasing complexity, including indirect mechanisms where the leader's strategy is setting up the mechanism's rules

    Riemannian tangent space mapping and elastic net regularization for cost-effective EEG markers of brain atrophy in Alzheimer's disease

    Full text link
    The diagnosis of Alzheimer's disease (AD) in routine clinical practice is most commonly based on subjective clinical interpretations. Quantitative electroencephalography (QEEG) measures have been shown to reflect neurodegenerative processes in AD and might qualify as affordable and thereby widely available markers to facilitate the objectivization of AD assessment. Here, we present a novel framework combining Riemannian tangent space mapping and elastic net regression for the development of brain atrophy markers. While most AD QEEG studies are based on small sample sizes and psychological test scores as outcome measures, here we train and test our models using data of one of the largest prospective EEG AD trials ever conducted, including MRI biomarkers of brain atrophy.Comment: Presented at NIPS 2017 Workshop on Machine Learning for Healt

    A Bayesian approach to analyzing phenotype microarray data enables estimation of microbial growth parameters

    Get PDF
    Biolog phenotype microarrays enable simultaneous, high throughput analysis of cell cultures in different environments. The output is high-density time-course data showing redox curves (approximating growth) for each experimental condition. The software provided with the Omnilog incubator/reader summarizes each time-course as a single datum, so most of the information is not used. However, the time courses can be extremely varied and often contain detailed qualitative (shape of curve) and quantitative (values of parameters) information. We present a novel, Bayesian approach to estimating parameters from Phenotype Microarray data, fitting growth models using Markov Chain Monte Carlo methods to enable high throughput estimation of important information, including length of lag phase, maximal ``growth'' rate and maximum output. We find that the Baranyi model for microbial growth is useful for fitting Biolog data. Moreover, we introduce a new growth model that allows for diauxic growth with a lag phase, which is particularly useful where Phenotype Microarrays have been applied to cells grown in complex mixtures of substrates, for example in industrial or biotechnological applications, such as worts in brewing. Our approach provides more useful information from Biolog data than existing, competing methods, and allows for valuable comparisons between data series and across different models

    Market intermediation: information, computation, and incentives

    No full text
    Auctions are a major field of interest in game theory and in the wider mi- croeconomics area, reflected by recognitions such as Nobel prizes to William Vickrey and Paul Milgrom. The algorithmic game theory literature too pro- vides discussion of a wide range of different auction settings. But real-life markets are rarely comprised of a single monopolist facing buyers without alternative. We therefore explore market intermediation, in which we aim to match buyers and sellers to achieve some objective. While auctions have been well-explored in manifold variations, intermediation has received less attention in the literature. We aim to move beyond the independent, single-unit case and explore the limits of what can be achieved in more complex scenarios. In the first part, we look at a correlated-priors setting. We show that the revenue-optimal mechanism for this can be computed using a polynomial-time algorithm for one buyer and one seller. For two or more buyers we show that this problem is NP-hard, in contrast, but that truthful-in-expectation mechanisms can be computed using an LP in polynomial time for fixed number of buyers and sellers. In this setting we further discuss how market intermediation relates to classical auctions, as well as reverse auctions. Further motivating our results, we show that our discussion of market intermediation can lead back to useful results for both of these settings, giving an improved algorithm for the optimal two-bidder auction, and showing for the first time that a reverse auction behaves differently than an auction. In the second part, we consider an online intermediation setting, in which the market maker encounters an unknown sequence of buyers and sellers one at a time, with knowledge of their independent priors. We explore this from the point of view of online algorithms and competitive analysis, comparing against an offline adversary who knows the buyer-seller sequence in advance. For the general case, we show that the competitive ratio of the intermediary’s revenue grows as the square root of the number of buyers and sellers. In contrast, we consider two settings with natural restrictions; one in which the sequence is balanced, and one in which there is an upper limit on the number of items the intermediary is allowed to hold at any one time. For both these settings we show that the competitive ratio is constant. Finally, in the third part we explore multi-unit intermediation. In this, we consider one seller and one buyer each having concave valuation of a number of items. The intermediary’s aim will be to maximise welfare, while maintaining budget balance. This setting has been explored for the single- item case, along with simple reductions for divisible goods to that case. We will give a strong characterisation result as well as approximation guarantees for the multi-unit case.</p

    Multi-Unit Bilateral Trade

    No full text
    We characterise the set of dominant strategy incentive compatible (DSIC), strongly budget balanced (SBB), and ex-post individually rational (IR) mechanisms for the multi-unit bilateral trade setting. In such a setting there is a single buyer and a single seller who holds a finite number k of identical items. The mechanism has to decide how many units of the item are transferred from the seller to the buyer and how much money is transferred from the buyer to the seller. We consider two classes of valuation functions for the buyer and seller: Valuations that are increasing in the number of units in possession, and the more specific class of valuations that are increasing and submodular. Furthermore, we present some approximation results about the performance of certain such mechanisms, in terms of social welfare: For increasing submodular valuation functions, we show the existence of a deterministic 2-approximation mechanism and a randomised e/(1 − e) approximation mechanism, matching the best known bounds for the single-item setting

    Cell-specific activation of the nrf2 antioxidant pathway increases mucosal inflammation in acute but not in chronic colitis

    Full text link
    BACKGROUND AND AIMS The transcription factor Nrf2 is a major modulator of the cellular antioxidant response. Oxidative burst of infiltrating macrophages leads to a massive production of reactive oxygen species in inflamed tissue of inflammatory bowel disease patients. This oxidative burst contributes to tissue destruction and epithelial permeability, but it is also an essential part of the antibacterial defence. We therefore investigated the impact of the Nrf2 orchestrated antioxidant response in both acute and chronic intestinal inflammation. METHODS To study the role of Nrf2 overexpression in mucosal inflammation, we used transgenic mice conditionally expressing a constitutively active form of Nrf2 [caNrf2] either in epithelial cells or in the myeloid cell lineage. Acute colitis was induced by dextran sulphate sodium [DSS] in transgenic and control animals, and changes in gene expression were evaluated by genome-wide expression studies. Long-term effects of Nrf2 activation were studied in mice with an IL-10 (-/-) background. RESULTS Expression of caNrf2 either in epithelial cells or myeloid cells resulted in aggravation of DSS-induced acute colitis. Aggravation of inflammation by caNrf2 was not observed in the IL-10 (-/-) model of spontaneous chronic colitis, where even a trend towards reduced prolapse rate was observed. CONCLUSIONS Our findings show that a well-balanced redox homeostasis is as important in epithelial cells as in myeloid cells during induction of colitis. Aggravation of acute DSS colitis in response to constitutive Nrf2 expression emphasises the importance of tight regulation of Nrf2 during the onset of intestinal inflammation
    corecore